The initial situation of our cloud migration journey
The legacy software application is an established data processing tool in the automotive industry. It was created three decades ago as a Windows desktop application, and the vast majority of the code is unmanaged. Managed code, an automation framework, a new GUI layer, and a containerized version (deriving from Windows container base images) were all introduced recently.
In the spring of 2020, we got a request from a customer, to prove a specific use case of our legacy application in the cloud, for which we also needed to scale our application. That was the trigger of our cloud migration journey. Within the next few days, we formed a small team, got in touch with an external cloud consultant, who also joined our team, organized an Azure subscription, analyzed the customer use case, and already racked our brains about how to start. Finally, after a few days, everything had been set so that the journey could be started.
More about building Cloud-Based Applications?
Explore the Cloud Platforms & Serverless Track
The pain points
This section describes some of the issues we encountered while migrating our legacy software application to Azure.
The cloud migration strategy
Finding a proper cloud migration strategy was critical. We immediately thought of two options: “refactoring – rebuild” and “lift & shift (rehost).” The first option would have required more effort in terms of development activities, but we would have most likely achieved better results in terms of performance. We rejected that approach due to a lack of staff and a limited time frame. As a result, the second option remained: “lift and shift (rehost),” which involved putting the whole application into a container, migrating it to the cloud, but accepting a performance hit because our legacy software application was not designed to operate in the cloud, not even as a container. In the long run, the first option would have definitely been the better choice, but the second was the right choice at the time for running a successful proof of concept to prove our idea of cloud implementation.[1]
Starting with an overly complex implementation
We were ready to start the first implementation, namely the first version of the microservice in Azure that included our application and additional components, but which services did we require, and which architecture pattern did we need to apply? We wanted to follow best practices, and fortunately, we found a suitable solution for our needs in the Microsoft documentation, as shown in the image below:
The existing application was our legacy software application, and its container image was uploaded and managed by the Azure Container Registry. Dedicated CLI commands were used to deploy the workload on the Azure Kubernetes Service. As a result, our container was hosted by the Azure Kubernetes Service, from which we established connections to other services. What could possibly have gone wrong with this plan? Murphy’s law states that everything goes wrong — and this was shown to be true. The following issues arose when implementing this architecture solution from scratch for the first time:
- Azure Container Registry was not properly attached to the Azure Kubernetes Service
STAY TUNED
Learn more about DevOpsCon
The picture below shows messages that appeared when an Azure Kubernetes Service was unable to pull images from an Azure Container Registry. This is probably a common mistake when implementing those Azure services for the first time.
- Missing fields of the YAML file for a Kubernetes deployment
As previously stated, we needed a running Windows container on the Azure Kubernetes Service that included our legacy software application. Using the provided YAML to apply the workload did not result in the deployment of our container, despite the fact that the Azure Kubernetes Service was provisioned with a Windows node. The problem was due to a missing field in the YAML file, which was in charge of scheduling the Pod on the right node:
nodeSelector:
kubernetes.io/os: windows
After adding the field in the YAML, the Pod could be scheduled onto the Windows node.[2]
- Wrong firewall settings
If specific license features were reached, our legacy software application would start up successfully. A connection to a license service installed on an Azure virtual machine was required for this. We had no intention of defining incorrect firewall settings, but how often do you set up a license service, taking into account a dynamic port range, and allowing the necessary apps to communicate through the firewall? We didn’t have to do it several times, but we were now aware of the port range. We eventually succeeded, but it took a long time.
All of those faults appear to have been avoidable and simple to correct. It was nearly impossible for us to pinpoint one of the issues, which involved the accumulation of multiple faults occurring at the same time while using a technology we had never used before.
After failing with that approach, we tried a different one: handing over some responsibilities to Azure in order to succeed. We tried again, this time using Azure Container Instances. The Azure Container Instances enabled the connection to our Azure Container Registry and ran the container without the need for provisioning infrastructure such as a virtual machine. Of course, we ran into the problem of not being able to connect to the license service, but we could see that the container image was pulled and an attempt was made to start the application inside the container. So we only had one problem left to solve in a reasonable amount of time. Finally, for the first time, we successfully deployed a container in Azure. That was significant because it demonstrated that the container could be hosted in the cloud. After a while, we also fixed the problem with the license service connection. We were ready to implement the use case on the Azure Kubernetes Service. We demonstrated that the container runs in isolation, so any future issues would be limited to the Kubernetes Service and the container registry. We also broke down the remaining barriers and successfully implemented the solution, which included the Azure Kubernetes Service, after a few days.
We were now aware that we had begun with an overly complex attempt. We had to take a step back to get back on track. It was about using Azure Container Instances in our case, which allowed us to deploy our container in an isolated way in Azure without having to manage the infrastructure behind it. This gave us the motivation and insight we needed to prepare for the deployment using an Azure Kubernetes Service.[3]
Manual provisioning of the resources and high costs
The final implementation was about the solution idea containing an Azure Kubernetes Service. It was the first time any of our team members had created a Kubernetes Service, which we provisioned through the Azure Portal. It worked, and we could execute our use case. After a few days, we noticed a significant increase in the allocated costs. One major cost driver was the Kubernetes Service, which has been running nonstop since the day we created it. Some people in our company were not pleased, so we needed to take action. Nobody wanted to touch or modify the operational Kubernetes Service because we needed it for demonstrations, but what were we to do? The solution was to use an “Infrastructure as Code” approach, and we chose Terraform. We quickly created a Terraform configuration file that defined the resources and integrated it into an Azure DevOps pipeline, allowing everyone to provision and destroy the resources. After we completed the Terraform configuration file and the Azure DevOps Pipeline, we were able to remove the manually created Kubernetes Service, container registry, etc.
Without knowledge of “Infrastructure as Code” approaches, you are likely to provision the essential services manually, especially at the start of a cloud migration journey. One significant disadvantage of doing so is that hesitations arise when adaptations are required, especially when a stable version has been achieved. Of course, running services that require a large number of computing and storage resources increases costs. As a result, it is recommended to introduce and use “Infrastructure as Code” from the beginning. You should only use your resources when you need them; otherwise, you should destroy them. Integrate the configuration file in a version control system, such as Git. Introduce pipelines to allow for the provisioning and destruction of resources for all members of your team. Estimate your costs ahead of time and keep track of your operating resources. This should protect you from unpleasant financial surprises.[4]
Conclusion
In retrospect, three major aspects of our cloud migration journey stand out:
Choosing the cloud migration strategy: The decision to use the “lift and shift (rehost)” approach was influenced by the limited time, a small number of team members, and the fact that it was about running a proof of concept rather than finding a solution for a final productive environment.
Starting with a more lightweight solution: Starting with a “complex” microservice that included a Kubernetes Service was probably too ambitious. The first goal should be to quickly get the container (including your application) running in the cloud without having to worry about provisioning and managing the necessary infrastructure. Azure Container Instances, for example, seem to be ideal for this. Demonstrate that the container can run in isolation in the cloud. After that, think about more advanced infrastructure solutions.
Introducing infrastructure as code and reducing costs: You should not be afraid of adapting or destroying the resources provided. Enable an automated method of creating your environment that can be reproduced and whose configuration is securely stored in a version control system. Provide resources when needed, destroy them otherwise, and keep track of them. This saves money.
This cloud migration journey was a formative experience, addressing a variety of issues that have become important in our everyday lives.
STAY TUNED
Learn more about DevOpsCon
References
[1] Microsoft, et al. “Using Windows Containers to “Containerize” Existing Applications.” Microsoft Learn – Lift and shift to Windows containers, 16 December 2022, https://learn.microsoft.com/en-us/virtualization/windowscontainers/quick-start/lift-shift-to-containers. Accessed 15 February 2023.
[2] “Assigning Pods to Nodes.” Kubernetes, https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector. Accessed 15 February 2023.
[3] “Serverless containers in Azure – Azure Container Instances.” Microsoft Learn, 25 October 2022, https://learn.microsoft.com/en-us/azure/container-instances/container-instances-overview. Accessed 15 February 2023.
[4] “What is Infrastructure as Code with Terraform? | Terraform.” HashiCorp Developer, https://developer.hashicorp.com/terraform/tutorials/azure-get-started/infrastructure-as-code. Accessed 15 February 2023.